41 research outputs found
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Similarity-based approaches represent a promising direction for time series
analysis. However, many such methods rely on parameter tuning, and some have
shortcomings if the time series are multivariate (MTS), due to dependencies
between attributes, or the time series contain missing data. In this paper, we
address these challenges within the powerful context of kernel methods by
proposing the robust \emph{time series cluster kernel} (TCK). The approach
taken leverages the missing data handling properties of Gaussian mixture models
(GMM) augmented with informative prior distributions. An ensemble learning
approach is exploited to ensure robustness to parameters by combining the
clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other
state-of-the-art techniques. The experimental results demonstrate that the TCK
is robust to parameter choices, provides competitive results for MTS without
missing data and outstanding results for missing data.Comment: 23 pages, 6 figure
Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks
Clinical measurements that can be represented as time series constitute an
important fraction of the electronic health records and are often both
uncertain and incomplete. Recurrent neural networks are a special class of
neural networks that are particularly suitable to process time series data but,
in their original formulation, cannot explicitly deal with missing data. In
this paper, we explore imputation strategies for handling missing values in
classifiers based on recurrent neural network (RNN) and apply a recently
proposed recurrent architecture, the Gated Recurrent Unit with Decay,
specifically designed to handle missing data. We focus on the problem of
detecting surgical site infection in patients by analyzing time series of their
blood sample measurements and we compare the results obtained with different
RNN-based classifiers
Noisy multi-label semi-supervised dimensionality reduction
Noisy labeled data represent a rich source of information that often are
easily accessible and cheap to obtain, but label noise might also have many
negative consequences if not accounted for. How to fully utilize noisy labels
has been studied extensively within the framework of standard supervised
machine learning over a period of several decades. However, very little
research has been conducted on solving the challenge posed by noisy labels in
non-standard settings. This includes situations where only a fraction of the
samples are labeled (semi-supervised) and each high-dimensional sample is
associated with multiple labels. In this work, we present a novel
semi-supervised and multi-label dimensionality reduction method that
effectively utilizes information from both noisy multi-labels and unlabeled
data. With the proposed Noisy multi-label semi-supervised dimensionality
reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled
data are labeled simultaneously via a specially designed label propagation
algorithm. NMLSDR then learns a projection matrix for reducing the
dimensionality by maximizing the dependence between the enlarged and denoised
multi-label space and the features in the projected space. Extensive
experiments on synthetic data, benchmark datasets, as well as a real-world case
study, demonstrate the effectiveness of the proposed algorithm and show that it
outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page
Time series cluster kernels to exploit informative missingness and incomplete label information
The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture
models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing
values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning.
However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g.
medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our
approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited.
Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label
information to learn more accurate similarities.
Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal
electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the
effectiveness of the proposed method
On the Use of Time Series Kernel and Dimensionality Reduction to Identify the Acquisition of Antimicrobial Multidrug Resistance in the Intensive Care Unit
Presentation at the 2021 KDD Workshop on Applied Data Science for Healthcare, 15.08.21 - 16.08.21. https://dshealthkdd.github.io/dshealth-2021/The acquisition of Antimicrobial Multidrug Resistance (AMR) in
patients admitted to the Intensive Care Units (ICU) is a major global
concern. This study analyses data in the form of multivariate time
series (MTS) from 3476 patients recorded at the ICU of University
Hospital of Fuenlabrada (Madrid) from 2004 to 2020. 18% of the
patients acquired AMR during their stay in the ICU. The goal of this
paper is an early prediction of the development of AMR. Towards
that end, we leverage the time-series cluster kernel (TCK) to learn
similarities between MTS. To evaluate the effectiveness of TCK as
a kernel, we applied several dimensionality reduction techniques
for visualization and classification tasks. The experimental results
show that TCK allows identifying a group of patients that acquire
the AMR during the first 48 hours of their ICU stay, and it also
provides good classification capabilities
On the Differential Analysis of Enterprise Valuation Methods as a Guideline for Unlisted Companies Assessment (I): Empowering Discounted Cash Flow Valuation
The Discounted Cash Flow (DCF) method is probably the most extended approach used
in company valuation, its main drawbacks being probably the known extreme sensitivity to key
variables such asWeighted Average Cost of Capital (WACC) and Free Cash Flow (FCF) estimations
not unquestionably obtained. In this paper we propose an unbiased and systematic DCF method
which allows us to value private equity by leveraging on stock markets evidences, based on a
twofold approach: First, the use of the inverse method assesses the existence of a coherentWACC
that positively compares with market observations; second, different FCF forecasting methods
are benchmarked and shown to correspond with actual valuations. We use financial historical
data including 42 companies in five sectors, extracted from Eikon-Reuters. Our results show that
WACC and FCF forecasting are not coherent with market expectations along time, with sectors,
or with market regions, when only historical and endogenous variables are taken into account.
The best estimates are found when exogenous variables, operational normalization of input space,
and data-driven linear techniques are considered (Root Mean Square Error of 6.51). Our method
suggests that FCFs and their positive alignment with Market Capitalization and the subordinate
enterprise value are the most influencing variables. The fine-tuning of the methods presented here,
along with an exhaustive analysis using nonlinear machine-learning techniques, are developed and
discussed in the companion paper
Opening the 21st Century Technologies to Industries: On the Special Issue Machine Learning for Society
Machine learning techniques, more commonly known today as artificial intelligence, are playing an increasingly important role in all aspects of our lives. Their applications extend to all areas of society where similar techniques can be accommodated to provide efficient and interesting solutions to a wide range of problems. In this Special Issue entitled Machine Learning for Society [1], we present some examples of the applications of this type of technique. From the valuation of unlisted companies to the characterization of clients, through the detection of financial crises or the prediction of the behavior of the exchange rate, this group of works presented here has in common the search for efficient solutions based on a set of historical data, and the application of artificial intelligence techniques. The techniques and datasets used, as well as the relevant findings developed in the different articles of this Special Issue, are summarized below
On the Differential Analysis of Enterprise Valuation Methods as a Guideline for Unlisted Companies Assessment (II): Applying Machine-Learning Techniques for Unbiased Enterprise Value Assessment
The search for an unbiased company valuation method to reduce uncertainty, whether
or not it is automatic, has been a relevant topic in social sciences and business development for
decades. Many methods have been described in the literature, but consensus has not been reached.
In the companion paper we aimed to review the assessment capabilities of traditional company
valuation model, based on company’s intrinsic value using the Discounted Cash Flow (DCF).
In this paper, we capitalized on the potential of exogenous information combined with Machine
Learning (ML) techniques. To do so, we performed an extensive analysis to evaluate the predictive
capabilities with up to 18 different ML techniques. Endogenous variables (features) related to
value creation (DCF) were proved to be crucial elements for the models, while the incorporation of
exogenous, industry/country specific ones, incrementally improves the ML performance. Bagging
Trees, Supported Vector Machine Regression, Gaussian Process Regression methods consistently
provided the best results. We concluded that an unbiased model can be created based on endogenous
and exogenous information to build a reference framework, to price and benchmark Enterprise Value
for valuation and credit risk assessment